Overview

Dataset statistics

Number of variables23
Number of observations65188
Missing cells64952
Missing cells (%)4.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory62.8 MiB
Average record size in memory1010.2 B

Variable types

CAT14
NUM7
BOOL1
UNSUPPORTED1

Reproduction

Analysis started2020-05-04 13:25:23.037182
Analysis finished2020-05-04 13:29:10.818671
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
REF has a high cardinality: 866 distinct values High cardinality
ALT has a high cardinality: 458 distinct values High cardinality
CLNHGVS has a high cardinality: 65188 distinct values High cardinality
MC has a high cardinality: 90 distinct values High cardinality
Allele has a high cardinality: 374 distinct values High cardinality
EXON has a high cardinality: 3264 distinct values High cardinality
cDNA_position has a high cardinality: 13970 distinct values High cardinality
CDS_position has a high cardinality: 13663 distinct values High cardinality
Protein_position has a high cardinality: 7339 distinct values High cardinality
Amino_acids has a high cardinality: 1262 distinct values High cardinality
Codons has a high cardinality: 2220 distinct values High cardinality
CADD_RAW is highly correlated with CADD_PHREDHigh Correlation
CADD_PHRED is highly correlated with CADD_RAWHigh Correlation
MC has 846 (1.3%) missing values Missing
EXON has 8893 (13.6%) missing values Missing
cDNA_position has 8884 (13.6%) missing values Missing
CDS_position has 9955 (15.3%) missing values Missing
Protein_position has 9955 (15.3%) missing values Missing
Amino_acids has 10004 (15.3%) missing values Missing
Codons has 10004 (15.3%) missing values Missing
LoFtool has 4213 (6.5%) missing values Missing
CADD_PHRED has 1092 (1.7%) missing values Missing
CADD_RAW has 1092 (1.7%) missing values Missing
CHROM is an unsupported type, check if it needs cleaning or further analysis Rejected
AF_ESP has 35781 (54.9%) zeros Zeros
AF_EXAC has 24047 (36.9%) zeros Zeros
AF_TGP has 37972 (58.2%) zeros Zeros

Variables

CHROM
Unsupported

REJECTED
UNSUPPORTED
Missing0
Missing (%)0.0%
Memory size509.4 KiB

POS
Real number (ℝ≥0)

Distinct count63115
Unique (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean77575938.96
Minimum961
Maximum247607973
Zeros0
Zeros (%)0.0%
Memory size509.4 KiB

Quantile statistics

Minimum961
5-th percentile4876674.45
Q132541793
median57970213
Q3112745411.2
95-th percentile187122313.8
Maximum247607973
Range247607012
Interquartile range (IQR)80203618.25

Descriptive statistics

Standard deviation59740509.88
Coefficient of variation (CV)0.7700907096
Kurtosis-0.1906181669
Mean77575938.96
Median Absolute Deviation (MAD)50014175.48
Skewness0.8029306643
Sum5.057020309e+12
Variance3.568928521e+15
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[9.61000000e+02 1.57170000e+04 2.18465000e+05 2.18490500e+05 2.23595500e+05 ... 2.47582272e+08 2.47582350e+08 2.47587376e+08 2.47588864e+08 2.47607973e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
89876827 11 < 0.1%
 
179578108 9 < 0.1%
 
73613031 8 < 0.1%
 
92944314 7 < 0.1%
 
103629803 7 < 0.1%
 
17697093 6 < 0.1%
 
51175655 6 < 0.1%
 
25031776 6 < 0.1%
 
11097199 5 < 0.1%
 
98270646 5 < 0.1%
 
Other values (63105) 65118 99.9%
 
ValueCountFrequency (%) 
961 1 < 0.1%
 
1291 1 < 0.1%
 
1393 1 < 0.1%
 
1462 1 < 0.1%
 
3243 1 < 0.1%
 
ValueCountFrequency (%) 
247607973 1 < 0.1%
 
247607371 1 < 0.1%
 
247592912 1 < 0.1%
 
247588869 1 < 0.1%
 
247588858 1 < 0.1%
 

REF
Categorical

HIGH CARDINALITY
Distinct count866
Unique (%)1.3%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
C
21798
G
21361
A
9845
T
9421
CT
 
126
Other values (861)
 
2637
ValueCountFrequency (%) 
C 21798 33.4%
 
G 21361 32.8%
 
A 9845 15.1%
 
T 9421 14.5%
 
CT 126 0.2%
 
GC 113 0.2%
 
TG 105 0.2%
 
AG 104 0.2%
 
AC 103 0.2%
 
GA 91 0.1%
 
Other values (856) 2121 3.3%
 

Length

Max length127
Mean length1.174863472
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 4 100.0%
 
ValueCountFrequency (%) 
Latin 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

ALT
Categorical

HIGH CARDINALITY
Distinct count458
Unique (%)0.7%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
T
20409
A
20205
G
11782
C
11429
TA
 
118
Other values (453)
 
1245
ValueCountFrequency (%) 
T 20409 31.3%
 
A 20205 31.0%
 
G 11782 18.1%
 
C 11429 17.5%
 
TA 118 0.2%
 
CT 93 0.1%
 
CA 77 0.1%
 
AT 75 0.1%
 
GA 67 0.1%
 
GT 64 0.1%
 
Other values (448) 869 1.3%
 

Length

Max length100
Mean length1.072359944
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 4 100.0%
 
ValueCountFrequency (%) 
Latin 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

AF_ESP
Real number (ℝ≥0)

ZEROS
Distinct count2842
Unique (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.01451052188
Minimum0
Maximum0.499
Zeros35781
Zeros (%)54.9%
Memory size509.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.0012
95-th percentile0.075765
Maximum0.499
Range0.499
Interquartile range (IQR)0.0012

Descriptive statistics

Standard deviation0.05779541015
Coefficient of variation (CV)3.983000105
Kurtosis32.06061665
Mean0.01451052188
Median Absolute Deviation (MAD)0.02422527929
Skewness5.465588287
Sum945.9119
Variance0.003340309435
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-05 1.5000e-04 2.5000e-04 3.5000e-04 ... 1.0415e-01 1.7640e-01 2.0745e-01 3.2555e-01 4.9900e-01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 35781 54.9%
 
0.0001 3924 6.0%
 
0.0002 3199 4.9%
 
0.0003 1110 1.7%
 
0.0005 994 1.5%
 
0.0004 819 1.3%
 
0.0009 637 1.0%
 
0.0006 523 0.8%
 
0.0007 457 0.7%
 
0.0008 440 0.7%
 
Other values (2832) 17304 26.5%
 
ValueCountFrequency (%) 
0 35781 54.9%
 
0.0001 3924 6.0%
 
0.0002 3199 4.9%
 
0.0003 1110 1.7%
 
0.0004 819 1.3%
 
ValueCountFrequency (%) 
0.499 1 < 0.1%
 
0.4989 1 < 0.1%
 
0.4986 1 < 0.1%
 
0.4985 1 < 0.1%
 
0.4979 1 < 0.1%
 

AF_EXAC
Real number (ℝ≥0)

ZEROS
Distinct count6667
Unique (%)10.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0144921754
Minimum0
Maximum0.49989
Zeros24047
Zeros (%)36.9%
Memory size509.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4e-05
Q30.00123
95-th percentile0.0768395
Maximum0.49989
Range0.49989
Interquartile range (IQR)0.00123

Descriptive statistics

Standard deviation0.05954209632
Coefficient of variation (CV)4.108568568
Kurtosis31.33022126
Mean0.0144921754
Median Absolute Deviation (MAD)0.02460664223
Skewness5.434358612
Sum944.71593
Variance0.003545261235
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-06 1.50000e-05 2.50000e-05 3.50000e-05 ... 1.07935e-01 1.35870e-01 2.13700e-01 4.00425e-01 4.99890e-01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 24047 36.9%
 
1e-05 3263 5.0%
 
3e-05 2321 3.6%
 
2e-05 2037 3.1%
 
4e-05 1013 1.6%
 
5e-05 881 1.4%
 
7e-05 841 1.3%
 
6e-05 665 1.0%
 
8e-05 663 1.0%
 
0.00012 492 0.8%
 
Other values (6657) 28965 44.4%
 
ValueCountFrequency (%) 
0 24047 36.9%
 
1e-05 3263 5.0%
 
2e-05 2037 3.1%
 
3e-05 2321 3.6%
 
4e-05 1013 1.6%
 
ValueCountFrequency (%) 
0.49989 1 < 0.1%
 
0.49974 1 < 0.1%
 
0.49967 1 < 0.1%
 
0.49962 1 < 0.1%
 
0.4996 1 < 0.1%
 

AF_TGP
Real number (ℝ≥0)

ZEROS
Distinct count2087
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.01526349635
Minimum0
Maximum0.4998
Zeros37972
Zeros (%)58.2%
Memory size509.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.0016
95-th percentile0.0837
Maximum0.4998
Range0.4998
Interquartile range (IQR)0.0016

Descriptive statistics

Standard deviation0.05952740749
Coefficient of variation (CV)3.899985045
Kurtosis30.43211765
Mean0.01526349635
Median Absolute Deviation (MAD)0.02538392379
Skewness5.328800456
Sum994.9968
Variance0.003543512242
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000e+00 1.000e-04 2.500e-04 3.500e-04 4.500e-04 ... 7.920e-02 1.125e-01 1.644e-01 2.723e-01 4.998e-01], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 37972 58.2%
 
0.0002 3786 5.8%
 
0.0004 2073 3.2%
 
0.0006 1352 2.1%
 
0.0008 1059 1.6%
 
0.001 872 1.3%
 
0.0012 679 1.0%
 
0.0014 609 0.9%
 
0.0016 584 0.9%
 
0.0018 472 0.7%
 
Other values (2077) 15730 24.1%
 
ValueCountFrequency (%) 
0 37972 58.2%
 
0.0002 3786 5.8%
 
0.0003 129 0.2%
 
0.0004 2073 3.2%
 
0.0005 85 0.1%
 
ValueCountFrequency (%) 
0.4998 1 < 0.1%
 
0.4994 1 < 0.1%
 
0.499 1 < 0.1%
 
0.4984 1 < 0.1%
 
0.4976 1 < 0.1%
 

CLNHGVS
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count65188
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
NC_000001.10:g.201043755G>A
 
1
NC_000007.13:g.6013143C>T
 
1
NC_000002.11:g.215645514A>C
 
1
NC_000015.9:g.93547852A>G
 
1
NC_000001.10:g.228345567C>T
 
1
Other values (65183)
65183
ValueCountFrequency (%) 
NC_000001.10:g.201043755G>A 1 < 0.1%
 
NC_000007.13:g.6013143C>T 1 < 0.1%
 
NC_000002.11:g.215645514A>C 1 < 0.1%
 
NC_000015.9:g.93547852A>G 1 < 0.1%
 
NC_000001.10:g.228345567C>T 1 < 0.1%
 
NC_000019.9:g.11105560C>A 1 < 0.1%
 
NC_000016.9:g.56388893C>T 1 < 0.1%
 
NC_000013.10:g.32913537G>A 1 < 0.1%
 
NC_000001.10:g.115218260A>T 1 < 0.1%
 
NC_000016.9:g.50733392T>A 1 < 0.1%
 
Other values (65178) 65178 > 99.9%
 

Length

Max length102
Mean length26.42133522
Min length20
ValueCountFrequency (%) 
Lowercase_Letter 11 34.4%
 
Decimal_Number 10 31.2%
 
Uppercase_Letter 5 15.6%
 
Other_Punctuation 2 6.2%
 
Close_Punctuation 1 3.1%
 
Math_Symbol 1 3.1%
 
Open_Punctuation 1 3.1%
 
Connector_Punctuation 1 3.1%
 
ValueCountFrequency (%) 
Latin 16 50.0%
 
Common 16 50.0%
 
ValueCountFrequency (%) 
ASCII 32 100.0%
 

CLNVC
Categorical

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
single_nucleotide_variant
61281
Deletion
 
2509
Duplication
 
1034
Indel
 
247
Insertion
 
95
Other values (2)
 
22
ValueCountFrequency (%) 
single_nucleotide_variant 61281 94.0%
 
Deletion 2509 3.8%
 
Duplication 1034 1.6%
 
Indel 247 0.4%
 
Insertion 95 0.1%
 
Inversion 17 < 0.1%
 
Microsatellite 5 < 0.1%
 

Length

Max length25
Mean length24.01951279
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 15 78.9%
 
Uppercase_Letter 3 15.8%
 
Connector_Punctuation 1 5.3%
 
ValueCountFrequency (%) 
Latin 18 94.7%
 
Common 1 5.3%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 

MC
Categorical

HIGH CARDINALITY
MISSING
Distinct count90
Unique (%)0.1%
Missing846
Missing (%)1.3%
Memory size509.4 KiB
SO:0001583|missense_variant
28457
SO:0001819|synonymous_variant
16549
SO:0001627|intron_variant
7534
SO:0001583|missense_variant,SO:0001627|intron_variant
 
2803
SO:0001589|frameshift_variant
 
1622
Other values (85)
7377
ValueCountFrequency (%) 
SO:0001583|missense_variant 28457 43.7%
 
SO:0001819|synonymous_variant 16549 25.4%
 
SO:0001627|intron_variant 7534 11.6%
 
SO:0001583|missense_variant,SO:0001627|intron_variant 2803 4.3%
 
SO:0001589|frameshift_variant 1622 2.5%
 
SO:0001587|nonsense 1573 2.4%
 
SO:0001627|intron_variant,SO:0001819|synonymous_variant 1148 1.8%
 
SO:0001583|missense_variant,SO:0001623|5_prime_UTR_variant 724 1.1%
 
SO:0001623|5_prime_UTR_variant 516 0.8%
 
SO:0001575|splice_donor_variant 504 0.8%
 
Other values (80) 2912 4.5%
 
(Missing) 846 1.3%
 

Length

Max length121
Mean length30.09966558
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 19 47.5%
 
Decimal_Number 10 25.0%
 
Uppercase_Letter 7 17.5%
 
Other_Punctuation 2 5.0%
 
Connector_Punctuation 1 2.5%
 
Math_Symbol 1 2.5%
 
ValueCountFrequency (%) 
Latin 26 65.0%
 
Common 14 35.0%
 
ValueCountFrequency (%) 
ASCII 40 100.0%
 

CLASS
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
0
48754
1
16434
ValueCountFrequency (%) 
0 48754 74.8%
 
1 16434 25.2%
 

Allele
Categorical

HIGH CARDINALITY
Distinct count374
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
T
19991
A
19800
G
11397
C
10761
-
 
2510
Other values (369)
 
729
ValueCountFrequency (%) 
T 19991 30.7%
 
A 19800 30.4%
 
G 11397 17.5%
 
C 10761 16.5%
 
- 2510 3.9%
 
AA 46 0.1%
 
TT 38 0.1%
 
AT 20 < 0.1%
 
CA 19 < 0.1%
 
CT 17 < 0.1%
 
Other values (364) 589 0.9%
 

Length

Max length99
Mean length1.054979444
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 4 80.0%
 
Dash_Punctuation 1 20.0%
 
ValueCountFrequency (%) 
Latin 4 80.0%
 
Common 1 20.0%
 
ValueCountFrequency (%) 
ASCII 5 100.0%
 

IMPACT
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size509.4 KiB
MODERATE
33212
LOW
21642
MODIFIER
 
5582
HIGH
 
4752
ValueCountFrequency (%) 
MODERATE 33212 50.9%
 
LOW 21642 33.2%
 
MODIFIER 5582 8.6%
 
HIGH 4752 7.3%
 

Length

Max length8
Mean length6.048444499
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 13 100.0%
 
ValueCountFrequency (%) 
Latin 13 100.0%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

EXON
Categorical

HIGH CARDINALITY
MISSING
Distinct count3264
Unique (%)5.8%
Missing8893
Missing (%)13.6%
Memory size509.4 KiB
16/16
 
1129
11/27
 
807
4/10
 
752
3/3
 
581
2/2
 
570
Other values (3259)
52456
ValueCountFrequency (%) 
16/16 1129 1.7%
 
11/27 807 1.2%
 
4/10 752 1.2%
 
3/3 581 0.9%
 
2/2 570 0.9%
 
10/24 525 0.8%
 
326/363 368 0.6%
 
4/13 368 0.6%
 
4/4 354 0.5%
 
4/11 342 0.5%
 
Other values (3254) 50499 77.5%
 
(Missing) 8893 13.6%
 

Length

Max length7
Mean length4.305700436
Min length3
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Lowercase_Letter 2 15.4%
 
Other_Punctuation 1 7.7%
 
ValueCountFrequency (%) 
Common 11 84.6%
 
Latin 2 15.4%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

cDNA_position
Categorical

HIGH CARDINALITY
MISSING
Distinct count13970
Unique (%)24.8%
Missing8884
Missing (%)13.6%
Memory size509.4 KiB
852
 
31
878
 
30
1201
 
29
789
 
29
729
 
29
Other values (13965)
56156
ValueCountFrequency (%) 
852 31 < 0.1%
 
878 30 < 0.1%
 
1201 29 < 0.1%
 
789 29 < 0.1%
 
729 29 < 0.1%
 
433 29 < 0.1%
 
452 29 < 0.1%
 
432 28 < 0.1%
 
327 28 < 0.1%
 
566 28 < 0.1%
 
Other values (13960) 56014 85.9%
 
(Missing) 8884 13.6%
 

Length

Max length13
Mean length3.839924526
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Lowercase_Letter 2 14.3%
 
Dash_Punctuation 1 7.1%
 
Other_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 12 85.7%
 
Latin 2 14.3%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

CDS_position
Categorical

HIGH CARDINALITY
MISSING
Distinct count13663
Unique (%)24.7%
Missing9955
Missing (%)15.3%
Memory size509.4 KiB
1
 
36
696
 
35
465
 
32
379
 
32
402
 
31
Other values (13658)
55067
ValueCountFrequency (%) 
1 36 0.1%
 
696 35 0.1%
 
465 32 < 0.1%
 
379 32 < 0.1%
 
402 31 < 0.1%
 
507 30 < 0.1%
 
3 30 < 0.1%
 
606 30 < 0.1%
 
27 30 < 0.1%
 
207 30 < 0.1%
 
Other values (13653) 54917 84.2%
 
(Missing) 9955 15.3%
 

Length

Max length13
Mean length3.734935878
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Lowercase_Letter 2 14.3%
 
Dash_Punctuation 1 7.1%
 
Other_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 12 85.7%
 
Latin 2 14.3%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

Protein_position
Categorical

HIGH CARDINALITY
MISSING
Distinct count7339
Unique (%)13.3%
Missing9955
Missing (%)15.3%
Memory size509.4 KiB
1
 
100
27
 
80
127
 
78
69
 
75
12
 
74
Other values (7334)
54826
ValueCountFrequency (%) 
1 100 0.2%
 
27 80 0.1%
 
127 78 0.1%
 
69 75 0.1%
 
12 74 0.1%
 
158 73 0.1%
 
11 72 0.1%
 
38 71 0.1%
 
90 71 0.1%
 
57 71 0.1%
 
Other values (7329) 54468 83.6%
 
(Missing) 9955 15.3%
 

Length

Max length11
Mean length3.261919372
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 71.4%
 
Lowercase_Letter 2 14.3%
 
Dash_Punctuation 1 7.1%
 
Other_Punctuation 1 7.1%
 
ValueCountFrequency (%) 
Common 12 85.7%
 
Latin 2 14.3%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

Amino_acids
Categorical

HIGH CARDINALITY
MISSING
Distinct count1262
Unique (%)2.3%
Missing10004
Missing (%)15.3%
Memory size509.4 KiB
A
 
2005
L
 
2003
P
 
1858
S
 
1710
T
 
1677
Other values (1257)
45931
ValueCountFrequency (%) 
A 2005 3.1%
 
L 2003 3.1%
 
P 1858 2.9%
 
S 1710 2.6%
 
T 1677 2.6%
 
R/Q 1421 2.2%
 
R/H 1350 2.1%
 
G 1121 1.7%
 
R/C 1118 1.7%
 
A/T 1042 1.6%
 
Other values (1252) 39879 61.2%
 
(Missing) 10004 15.3%
 

Length

Max length45
Mean length2.494140026
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 21 80.8%
 
Lowercase_Letter 2 7.7%
 
Other_Punctuation 2 7.7%
 
Dash_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 23 88.5%
 
Common 3 11.5%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

Codons
Categorical

HIGH CARDINALITY
MISSING
Distinct count2220
Unique (%)4.0%
Missing10004
Missing (%)15.3%
Memory size509.4 KiB
cGg/cAg
 
915
Cgg/Tgg
 
852
cGc/cAc
 
769
Cga/Tga
 
734
Cgc/Tgc
 
730
Other values (2215)
51184
ValueCountFrequency (%) 
cGg/cAg 915 1.4%
 
Cgg/Tgg 852 1.3%
 
cGc/cAc 769 1.2%
 
Cga/Tga 734 1.1%
 
Cgc/Tgc 730 1.1%
 
gaC/gaT 701 1.1%
 
gcG/gcA 681 1.0%
 
Gtg/Atg 680 1.0%
 
ccG/ccA 643 1.0%
 
gcC/gcT 614 0.9%
 
Other values (2210) 47865 73.4%
 
(Missing) 10004 15.3%
 

Length

Max length133
Mean length6.490013499
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 5 45.5%
 
Uppercase_Letter 4 36.4%
 
Dash_Punctuation 1 9.1%
 
Other_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Latin 9 81.8%
 
Common 2 18.2%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

STRAND
Categorical

Distinct count2
Unique (%)< 0.1%
Missing14
Missing (%)< 0.1%
Memory size509.4 KiB
-1
32804
1
32370
ValueCountFrequency (%) 
-1 32804 50.3%
 
1 32370 49.7%
 
(Missing) 14 < 0.1%
 

Length

Max length4
Mean length3.503221452
Min length3
ValueCountFrequency (%) 
Decimal_Number 2 33.3%
 
Lowercase_Letter 2 33.3%
 
Dash_Punctuation 1 16.7%
 
Other_Punctuation 1 16.7%
 
ValueCountFrequency (%) 
Common 4 66.7%
 
Latin 2 33.3%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 

LoFtool
Real number (ℝ≥0)

MISSING
Distinct count1195
Unique (%)2.0%
Missing4213
Missing (%)6.5%
Infinite0
Infinite (%)0.0%
Mean0.3450584315
Minimum6.89e-05
Maximum1
Zeros0
Zeros (%)0.0%
Memory size509.4 KiB

Quantile statistics

Minimum6.89e-05
5-th percentile0.00145
Q10.0243
median0.157
Q30.71
95-th percentile0.971
Maximum1
Range0.9999311
Interquartile range (IQR)0.6857

Descriptive statistics

Standard deviation0.3612384434
Coefficient of variation (CV)1.046890644
Kurtosis-1.201434912
Mean0.3450584315
Median Absolute Deviation (MAD)0.3223397243
Skewness0.6522583053
Sum21039.93786
Variance0.130493213
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.971 2836 4.4%
 
0.0896 1953 3.0%
 
0.782 1909 2.9%
 
0.00386 1228 1.9%
 
0.0212 1094 1.7%
 
0.00207 1075 1.6%
 
0.116 1068 1.6%
 
0.0737 905 1.4%
 
0.965 798 1.2%
 
0.000276 640 1.0%
 
Other values (1185) 47469 72.8%
 
(Missing) 4213 6.5%
 
ValueCountFrequency (%) 
6.89e-05 60 0.1%
 
0.000138 162 0.2%
 
0.000207 118 0.2%
 
0.000276 640 1.0%
 
0.000344 197 0.3%
 
ValueCountFrequency (%) 
1 3 < 0.1%
 
0.999 3 < 0.1%
 
0.998 133 0.2%
 
0.997 45 0.1%
 
0.996 1 < 0.1%
 

CADD_PHRED
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
Distinct count9324
Unique (%)14.5%
Missing1092
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean15.68561648
Minimum0.001
Maximum99
Zeros0
Zeros (%)0.0%
Memory size509.4 KiB

Quantile statistics

Minimum0.001
5-th percentile0.07
Q17.141
median14.09
Q324.1
95-th percentile34
Maximum99
Range98.999
Interquartile range (IQR)16.959

Descriptive statistics

Standard deviation10.83635024
Coefficient of variation (CV)0.690846308
Kurtosis-0.3988884313
Mean15.68561648
Median Absolute Deviation (MAD)9.19150042
Skewness0.3787342923
Sum1005385.274
Variance117.4264864
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
34 1431 2.2%
 
35 1263 1.9%
 
33 931 1.4%
 
32 828 1.3%
 
0.001 509 0.8%
 
0.002 469 0.7%
 
31 458 0.7%
 
23 427 0.7%
 
23.1 412 0.6%
 
23.2 405 0.6%
 
Other values (9314) 56963 87.4%
 
(Missing) 1092 1.7%
 
ValueCountFrequency (%) 
0.001 509 0.8%
 
0.002 469 0.7%
 
0.003 212 0.3%
 
0.004 144 0.2%
 
0.005 108 0.2%
 
ValueCountFrequency (%) 
99 1 < 0.1%
 
81 1 < 0.1%
 
79 1 < 0.1%
 
74 1 < 0.1%
 
73 1 < 0.1%
 

CADD_RAW
Real number (ℝ)

HIGH CORRELATION
MISSING
Distinct count63803
Unique (%)99.5%
Missing1092
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean2.554131453
Minimum-5.477391
Maximum46.556261
Zeros0
Zeros (%)0.0%
Memory size509.4 KiB

Quantile statistics

Minimum-5.477391
5-th percentile-0.70556925
Q10.46295075
median1.6429485
Q34.38139175
95-th percentile7.45498775
Maximum46.556261
Range52.033652
Interquartile range (IQR)3.918441

Descriptive statistics

Standard deviation2.961553499
Coefficient of variation (CV)1.159514909
Kurtosis5.94858469
Mean2.554131453
Median Absolute Deviation (MAD)2.309544805
Skewness1.609072348
Sum163709.6096
Variance8.770799128
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.463563 3 < 0.1%
 
5.456311 2 < 0.1%
 
3.363536 2 < 0.1%
 
1.344155 2 < 0.1%
 
-0.019198 2 < 0.1%
 
2.534534 2 < 0.1%
 
4.437403 2 < 0.1%
 
0.753203 2 < 0.1%
 
1.549289 2 < 0.1%
 
6.273175 2 < 0.1%
 
Other values (63793) 64075 98.3%
 
(Missing) 1092 1.7%
 
ValueCountFrequency (%) 
-5.477391 1 < 0.1%
 
-4.682013 1 < 0.1%
 
-4.472198 1 < 0.1%
 
-4.450451 1 < 0.1%
 
-4.314148 1 < 0.1%
 
ValueCountFrequency (%) 
46.556261 1 < 0.1%
 
34.23672 1 < 0.1%
 
33.935525 1 < 0.1%
 
32.934203 1 < 0.1%
 
32.693999 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

CHROMPOSREFALTAF_ESPAF_EXACAF_TGPCLNHGVSCLNVCMCCLASSAlleleIMPACTEXONcDNA_positionCDS_positionProtein_positionAmino_acidsCodonsSTRANDLoFtoolCADD_PHREDCADD_RAW
011168180GC0.07710.100200.1066NC_000001.10:g.1168180G>Csingle_nucleotide_variantSO:0001583|missense_variant0CMODERATE1/1552522174E/DgaG/gaC1.0NaN1.053-0.208682
111470752GA0.00000.000000.0000NC_000001.10:g.1470752G>Asingle_nucleotide_variantSO:0001583|missense_variant0AMODERATE4/4523509170P/LcCg/cTg-1.0NaN31.0006.517838
211737942AG0.00000.000010.0000NC_000001.10:g.1737942A>Gsingle_nucleotide_variantSO:0001583|missense_variant,SO:0001623|5_prime_UTR_variant1GMODERATE6/1263223980I/TaTc/aCc-1.0NaN28.1006.061752
312160305GA0.00000.000000.0000NC_000001.10:g.2160305G>Asingle_nucleotide_variantSO:0001583|missense_variant0AMODERATE1/713210034G/SGgc/Agc1.0NaN22.5003.114491
412160305GT0.00000.000000.0000NC_000001.10:g.2160305G>Tsingle_nucleotide_variantSO:0001583|missense_variant0TMODERATE1/713210034G/CGgc/Tgc1.0NaN24.7004.766224
512160554GC0.00000.000000.0000NC_000001.10:g.2160554G>Csingle_nucleotide_variantSO:0001583|missense_variant0CMODERATE1/7381349117G/RGgc/Cgc1.0NaN23.7004.079099
613328358TC0.00000.000000.0000NC_000001.10:g.3328358T>Csingle_nucleotide_variantSO:0001583|missense_variant0CMODERATE9/1718581600534S/PTcg/Ccg1.00.1010.172-0.543433
713328659CT0.15230.131030.1060NC_000001.10:g.3328659C>Tsingle_nucleotide_variantSO:0001583|missense_variant0TMODERATE9/1721591901634P/LcCt/cTt1.00.10123.0003.424422
813347452GA0.00000.003570.0030NC_000001.10:g.3347452G>Asingle_nucleotide_variantSO:0001583|missense_variant1AMODERATE15/17356233041102V/MGtg/Atg1.00.10111.3601.126629
915925304GA0.00450.002310.0058NC_000001.10:g.5925304G>Asingle_nucleotide_variantSO:0001583|missense_variant0AMODERATE27/30394236741225T/MaCg/aTg-1.00.02122.1002.969650

Last rows

CHROMPOSREFALTAF_ESPAF_EXACAF_TGPCLNHGVSCLNVCMCCLASSAlleleIMPACTEXONcDNA_positionCDS_positionProtein_positionAmino_acidsCodonsSTRANDLoFtoolCADD_PHREDCADD_RAW
65178X154005088CCAAG0.00000.000000.0000NC_000023.10:g.154005109_154005111dupGAADuplicationNaN0AAGMODERATE15/151701-17021491-1492497-498-/K-/AAG1.0NaN14.5001.716666
65179X154005088CAAGC0.00000.000000.0000NC_000023.10:g.154005109_154005111delGAADeletionSO:0001624|3_prime_UTR_variant1-MODERATE15/151702-17041492-1494498K/-AAG/-1.0NaN16.3002.014044
65180X154005148GA0.07080.172330.1383NC_000023.10:g.154005148G>Asingle_nucleotide_variantSO:0001624|3_prime_UTR_variant0AMODIFIER15/151761NaNNaNNaNNaN1.0NaN6.2550.359695
65181X154065843GA0.01590.006890.0127NC_000023.10:g.154065843G>Asingle_nucleotide_variantSO:0001624|3_prime_UTR_variant0AMODIFIER26/267256NaNNaNNaNNaN-1.00.001583.0070.042639
65182X154157565CT0.01530.004730.0140NC_000023.10:g.154157565C>Tsingle_nucleotide_variantSO:0001819|synonymous_variant0TLOW14/26467145001500PccG/ccA-1.00.0015811.4401.142527
65183X154158201TG0.08010.139230.1605NC_000023.10:g.154158201T>Gsingle_nucleotide_variantSO:0001819|synonymous_variant0GLOW14/26403538641288StcA/tcC-1.00.001580.105-0.630908
65184X154159118CT0.00200.000600.0013NC_000023.10:g.154159118C>Tsingle_nucleotide_variantSO:0001583|missense_variant1TMODERATE14/2631182947983V/IGta/Ata-1.00.001580.002-1.731470
65185X154194886CT0.01250.003700.0111NC_000023.10:g.154194886C>Tsingle_nucleotide_variantSO:0001819|synonymous_variant0TLOW8/2612571086362AgcG/gcA-1.00.0015812.8501.412434
65186X154490187TC0.00030.000340.0000NC_000023.10:g.154490187T>Csingle_nucleotide_variantSO:0001819|synonymous_variant0CLOW2/2822543181TacA/acG-1.0NaN0.130-0.592415
65187X154508542GC0.00190.002670.0008NC_000023.10:g.154508542G>Csingle_nucleotide_variantSO:0001583|missense_variant0CMODERATE6/7791532178P/ACca/Gca-1.00.140000.046-0.786513